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Monte Carlo methods are now an essential part of 
the statistician's toolbox, to the point of being more 
familiar to graduate students than the measure theo- 
retic notions upon which they are based! We recall in 
this note some of the advances made in the design of 
Monte Carlo techniques towards their use in Statis- 
tics, referring to |Robert and Casella (2004 20101 for 



an in-depth coverage. 

The basic Monte Carlo principle 
and its extensions 

The most appealing feature of Monte Carlo methods 
[for a statistician] is that they rely on sampling and on 
probability notions, which are the bread and butter 
of our profession. Indeed, the foundation of Monte 
Carlo approximations is identical to the validation of 
empirical moment estimators in that the average 
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- ]Th(xt) 



xt ~ f(x) 



(1) 



is converging to the expectation E/[/i(X)] when T 
goes to infinity. Furthermore, the precision of this ap- 
proximation is exactly of the same kind as the preci- 
sion of a statistical estimate, in that it usually evolves 
as O(VT). Therefore, once a sample Xi,...,Xt is 
produced according to a distribution density /, all 
standard statistical tools, including bootstrap, apply 
to this sample (with the further appeal that more 
data points can be produced if deemed necessary) . As 
illustrated by Figure [T] the variability due to a single 
Monte Carlo experiment must be accounted for, when 
drawing conclusions about its output and evaluations 
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Figure 1: Monte Carlo evaluation ([TJ of the expecta- 
tion E [X 3 / ( 1 + X 2 + X 4 ) ] as a function of the number 
of simulation when X ~ A^Qm, 1) using (left) one sim- 
ulation run and (right) 100 independent runs for (top) 
fj, = and (bottom) (j, = 2.5. 



of the overall variability of the sequence of approxi- 
mations are provided in Kendall et al. (2007). But 



the ease with which such methods are analysed and 
the systematic resort to statistical intuition explain 
in part why Monte Carlo methods are privileged over 
numerical methods. 

The representation of integrals as expectations 
E/[/i(X)] is far from unique and there exist there- 
fore many possible approaches to the above approx- 
imation. This range of choices corresponds to the 
importance sampling strategies ( Rubinstein||198l ) in 
Monte Carlo, based on the obvious identity 

E f [h(X)]=E g [h(X)f(X)/g(X)} 

provided the support of the density g includes the 
support of /. Some choices of g may however lead to 
appallingly poor performances of the resulting Monte 
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Carlo estimates, in that the variance of the result- 
ing empirical average may be infinite, a danger worth 
highlighting since often neglected while having a ma- 
jor impact on the quality of the approximations. From 
a statistical perspective, there exist some natural choices 
for the importance function g, based on Fisher infor- 
mation and analytical approximations to the likeli- 
hood function like the Laplace approximation (iRuel 



ct al. 2008), even though it is more robust to replace 
the normal distribution in the Laplace approxima- 
tion with a t distribution. The special case of Bayes 
factors ( |Robert and Casella||2004 1 



B, 
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(x) = J f( x \e)M0)de I f(x\e)m(d)de , 



which drive Bayesian testing and model choice, and of 
their approximation has led to a specific class of im- 
portance sampling techniques known as bridge sam- 



pling (Chen et al. 20001 where the optimal impor- 



tance function is made of a mixture of the posterior 
distributions corresponding to both models (assum- 
ing both parameter spaces can be mapped into the 
same 0). We want to stress here that an alternative 
approximation of marginal likelihoods relying on the 
use of harmonic means HGelfand and Dey|1994||New^ 
ton and Raftery|1994 1 and of direct simulations from 
a posterior density has repeatedly been used in the 
literature, despite often suffering from infinite vari- 
ance (and thus numerical instability). Another po- 
tentially very efficient approximation of Bayes factors 



is provided by Chib's ( 1995 1 representation, based on 



parametric estimates to the posterior distribution. 



MCMC methods 

Markov chain Monte Carlo (MCMC) methods have 



been proposed many years (Metropolis et al. 1953) 



before their impact in Statistics was truly felt, 
ever, once 



How- 



Gelfand and Smith ( 1990 ) stressed the ul- 



timate feasibility of producing a Markov chain with 
a given stationary distribution /, either via a Gibbs 
sampler that simulates each conditional distribution 
of / in its turn, or via a Metropolis-Hastings algo- 
rithm based on a proposal q(y\x) with acceptance 
probability [for a move from x to y] 

min {L f(y)q(x\y) / f(x)q(y\x)} , 

then the spectrum of manageable models grew im- 
mensely and almost instantaneously. 

Due to parallel developments at the time on graph- 
ical and hierarchical Bayesian models, like generalised 
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Figure 2: (left) Gibbs sampling approximation to the 
distribution f(x) cx exp(— x 2 /2) / (1 + x 2 + x 4 ) against 
the true density; (right) range of convergence of the 
approximation to E/LY 3 ] = against the number of 
iterations using 100 independent runs of the Gibbs 
sampler, along with a single Gibbs run. 



linear mixed models (Zeger and Karim 19911, the 



wealth of multivariate models with available condi- 
tional distributions (and hence the potential of im- 
plementing the Gibbs sampler) was far from negligi- 
ble, especially when the availability of latent variables 
became quasi universal due to the slice sampling rep- 
resentations ( pamien et~aT1|1999| |Neal||2003[ ). (Al- 



though the adoption of Gibbs samplers has primarily 
taken place within Bayesian statistics, there is noth- 
ing that prevents an artificial augmentation of the 
data through such techniques.) 

For instance, if the density f(x) oc exp(— x 2 /2)/(l+ 
x 2 + x 4 ) is known up to a normalising constant, / is 
the marginal (in x) of the joint distribution 17(2:, u) oc 
exp(— x 2 /2)I(u(l + x 2 + x 4 ) < 1), when u is restricted 
to (0, 1). The corresponding slice sampler then con- 
sists in simulating 

U\X = x ~ W(0, 1/(1 + x 2 + x 4 )) 



and 



X\U = u ~ Af(0, 1)1(1 + x 2 + x 4 - < 1/u) 



the later being a truncated normal distribution. As 
shown by Figure [2] the outcome of the resulting Gibbs 
sampler perfectly fits the target density, while the 
convergence of the expectation of X 3 under / has a 
behaviour quite comparable with the iid setting. 

While the Gibbs sampler first appears as the natu- 
ral solution to solve a simulation problem in complex 
models if only because it stems from the true target 
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Figure 3: (left) Random walk Metropolis-Hastings 
sampling approximation to the distribution f(x) cx 
exp(— x 2 /2) / (1 + x 2 + x 4 ) against the true density for 
a scale of 1.2 corresponding to an acceptance rate of 
0.5; (right) range of convergence of the approximation 
to E/[X 3 ] = against the number of iterations us- 
ing 100 independent runs of the Metropolis-Hastings 
sampler, along with a single Metropolis-Hastings run. 



/, as exhibited by the widespread use of BUGS |Lunn| 
et al. ( 2000 ), which mostly focus on this approach, the 



infinite variations offered by the Metropolis-Hastings 
schemes offer much more efficient solutions when the 
proposal q{y\x) is appropriately chosen. The basic 
choice of a random walk proposal q(y\x) being then a 
normal density centred in x) can be improved by ex- 
ploiting some features of the target as in Langevin al- 
gorithms (see Robert and Casella|2004 section 7.8.5) 



and Hamiltonian or hybrid alternatives ( Duane et al. 
1987[ |Neal||1999[ ) that build upon gradients. More 



recent proposals include particle learning about the 
target and sequential improvement of the proposal 



(Douc et al. 2007 Rosenthal 2007 Andrieu et al. 



2010). Figure [3j reproduces Figure |2j for a random 



Some uses of Monte Carlo in Statis- 
tics 

The impact of Monte Carlo methods on Statistics has 
not been truly felt until the early 1980 's, with the 



walk Metropolis-Hastings algorithm whose scale is 
calibrated towards an acceptance rate of 0.5. The 
range of the convergence paths is clearly wider than 
for the Gibbs sampler, but the fact that this is a 
generic algorithm applying to any target (instead of 
a specialised version as for the Gibbs sampler) must 
be borne in mind. 

Another major improvement generated by a sta- 
tistical imperative is the development of variable di- 
mension generators that stemmed from Bayesian model 
choice requirements, the most important example be- 



ing the reversible jump algorithm in Green ( 1995 ) 



which had a significant impact on the study of graph- 
ical models ( Brooks et al.|20"03 l. 



publication of Rubinstein (19811 and Ripley (19871, 



but Monte Carlo methods have now become invalu- 
able in Statistics because they allow to address opti- 
misation, integration and exploration problems that 
would otherwise be unreachable. For instance, the 
calibration of many tests and the derivation of their 
acceptance regions can only be achieved by simula- 
tion techniques. While integration issues are often 
linked with the Bayesian approach — since Bayes esti- 
mates are posterior expectations like 



h(9)n(9\x) d9 



and Bayes tests also involve integration, as mentioned 
earlier with the Bayes factors — , and optimisation 
difficulties with the likelihood perspective, this clas- 
sification is by no way tight — as for instance when 
likelihoods involve unmanageable integrals — and all 
fields of Statistics, from design to econometrics, from 
genomics to psychometry and environmics, have now 
to rely on Monte Carlo approximations. A whole new 
range of statistical methodologies have entirely inte- 
grated the simulation aspects. Examples include the 



bootstrap methodology (Efron 19821, where multi 



level resampling is not conceivable without a com- 



puter, indirect inference (Gourieroux et al. 19931, 



which construct a pseudo- likelihood from simulations, 



MCEM (Cappe and Moulines 2009), where the E 



step of the EM algorithm is replaced with a Monte 
Carlo approximation, or the more recent approxi- 
mated Bayesian computation (ABC) used in popula- 
tion genetics (Beaumont et al. 2002), where the like- 
lihood is not manageable but the underlying model 
can be simulated from. 

In the past fifteen years, the collection of real 
problems that Statistics can [afford to] handle has 
truly undergone a quantum leap. Monte Carlo meth- 
ods and in particular MCMC techniques have forever 
changed the emphasis from "closed form" solutions 
to algorithmic ones, expanded our impact to solv- 
ing "real" applied problems while convincing scien- 
tists from other fields that statistical solutions were 
indeed available, and led us into a world where "ex- 
act" may mean "simulated" . The size of the data 
sets and of the models currently handled thanks to 
those tools, for example in genomics or in climatol- 



3 



ogy, is something that could not have been conceived 
60 years ago, when Ulam and von Neumann invented 
the Monte Carlo method. 
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